---
title: Cleanlab
slug: /bundles-cleanlab
---

import Icon from "@site/src/components/icon";
import PartialParams from '@site/docs/_partial-hidden-params.mdx';

[Cleanlab](https://www.cleanlab.ai/) adds automation and trust to every data point going in and every prediction coming out of AI and RAG solutions.

Use the Cleanlab components to integrate Cleanlab Evaluations with Langflow and unlock trustworthy agentic, RAG, and LLM pipelines with Cleanlab's evaluation and remediation suite.

You can use these components to quantify the trustworthiness of any LLM response with a score between `0` and `1`, and explain why a response may be good or bad. For RAG or agent pipelines with context, you can evaluate context sufficiency, groundedness, helpfulness, and query clarity with quantitative scores. Additionally, you can remediate low-trust responses with warnings or fallback answers.

Authentication is required with a Cleanlab API key.

## Cleanlab Evaluator

The **Cleanlab Evaluator** component evaluates and explains the trustworthiness of a prompt and response pair using Cleanlab. For more information on how the score works, see the [Cleanlab documentation](https://help.cleanlab.ai/tlm/).

### Cleanlab Evaluator parameters

<PartialParams />

| Name                    | Type       | Description                        |
|-------------------------|------------|------------------------------------|
| system_prompt           | Message    | Input parameter. The system message prepended to the prompt. Optional. |
| prompt                  | Message    | Input parameter. The user-facing input to the LLM.  |
| response                | Message    | Input parameter. The model's response to evaluate.    |
| cleanlab_api_key        | Secret     | Input parameter. Your Cleanlab API key.  |
| cleanlab_evaluation_model | Dropdown   | Input parameter. Evaluation model used by Cleanlab, such as GPT-4 or Claude. This doesn't need to be the same model that generated the response. |
| quality_preset          | Dropdown   | Input parameter. Tradeoff between evaluation speed and accuracy. |

### Cleanlab Evaluator outputs

The **Cleanlab Evaluator** component has three possible outputs.

| Name                    | Type       | Description            |
|-------------------------|------------|-------------------------|
| score                   | number, float | Displays the trust score between 0 and 1.  |
| explanation             | `Message`    | Provides an explanation of the trust score. |
| response                | `Message`    | Returns the original response for easy chaining to the **Cleanlab Remediator** component. |

## Cleanlab Remediator

The **Cleanlab Remediator** component uses the trust score from the [**Cleanlab Evaluator** component](#cleanlab-evaluator) to determine whether to show, warn about, or replace an LLM response.

This component has parameters for the score threshold, warning text, and fallback message that you can customize as needed.

The output is **Remediated Response** (`remediated_response`), which is a `Message` containing the final message shown to the user after remediation logic is applied.

### Cleanlab Remediator parameters

| Name                        | Type       | Description |
|-----------------------------|------------|---------|
| response                    | Message    | Input parameter. The response to potentially remediate.  |
| score                       | Number     | Input parameter. The trust score from `CleanlabEvaluator`. |
| explanation                 | Message    | Input parameter. The explanation to append if a warning is shown. Optional.|
| threshold                   | Float      | Input parameter. The minimum trust score to pass a response unchanged.  |
| show_untrustworthy_response | Boolean      | Input parameter. Whether to display or hide the original response with a warning if a response is deemed untrustworthy. |
| untrustworthy_warning_text  | Prompt     | Input parameter. The warning text for untrustworthy responses. |
| fallback_text              | Prompt     | Input parameter. The fallback message if the response is hidden. |

## Cleanlab RAG Evaluator

The **Cleanlab RAG Evaluator** component evaluates RAG and LLM pipeline outputs for trustworthiness, context sufficiency, response groundedness, helpfulness, and query ease using [Cleanlab's evaluation metrics](https://help.cleanlab.ai/tlm/use-cases/tlm_rag/).

You can pair this component with the [**Cleanlab Remediator** component](#cleanlab-remediator) to remediate low-trust responses coming from the RAG pipeline.

### Cleanlab RAG Evaluator parameters

<PartialParams />

| Name                        | Type       | Description |
|-----------------------------|------------|------------|
| cleanlab_api_key           | Secret     | Input parameter. Your Cleanlab API key.    |
| cleanlab_evaluation_model  | Dropdown   | Input parameter. The evaluation model used by Cleanlab, such as GPT-4, or Claude. This doesn't need to be the same model that generated the response. |
| quality_preset             | Dropdown   | Input parameter. The tradeoff between evaluation speed and accuracy.  |
| context                    | Message    | Input parameter. The retrieved context from your RAG system.   |
| query                      | Message    | Input parameter. The original user query.   |
| response                   | Message    | Input parameter. The model's response based on the context and query. |
| run_context_sufficiency    | Boolean      | Input parameter. Evaluate whether context supports answering the query.  |
| run_response_groundedness  | Boolean      | Input parameter. Evaluate whether the response is grounded in the context. |
| run_response_helpfulness   | Boolean      | Input parameter. Evaluate how helpful the response is.  |
| run_query_ease            | Boolean      | Input parameter. Evaluate if the query is vague, complex, or adversarial. |

### Cleanlab RAG Evaluator outputs

The **Cleanlab RAG Evaluator** component has the following output options:

| Name               | Type       | Description              |
|--------------------|------------|--------------------------|
| trust_score        | Number     | The overall trust score. |
| trust_explanation  | Message    | The explanation for the trust score. |
| other_scores       | Dictionary | A dictionary of optional enabled RAG evaluation metrics. |
| evaluation_summary | Message    | A Markdown summary of query, context, response, and evaluation results. |
| response           | Message    | Returns the original response for easy chaining to the **Cleanlab Remediator** component. |

## Example Cleanlab flows

The following example flows show how to use the **Cleanlab Evaluator** and **Cleanlab Remediator** components to evaluate and remediate responses from any LLM, and how to use the **Cleanlab RAG Evaluator** component to evaluate RAG pipeline outputs.

### Evaluate and remediate responses from an LLM

This flow evaluates and remediates the trustworthiness of a response from any LLM using the **Cleanlab Evaluator** and **Cleanlab Remediator** components.

You can [download the Evaluate and Remediate flow](/files/eval_and_remediate_cleanlab.json), and then [import the flow](/concepts-flows-import) to your Langflow instance.
Or, you can build the flow from scratch by connecting the following components:

* Connect the `Message` output from any **Language Model** or **Agent** component to the **Response** input of the **Cleanlab Evaluator** component.
* Connect a **Prompt Template** component to the **Cleanlab Evaluator** component's **Prompt** input.

<!-- Components missing per image -->

![Evaluate response trustworthiness](/img/eval_response.png)

When you run the flow, the **Cleanlab Evaluator** component returns a trust score and explanation from the flow.

The **Cleanlab Remediator** component uses this trust score to determine whether to output the original response, warn about it, or replace it with a fallback answer.

This example shows a response that was determined to be untrustworthy (a score of `.09`) and flagged with a warning by the **Cleanlab Remediator** component.

![Cleanlab Remediator Example](/img/cleanlab_remediator_example.png)

To hide untrustworthy responses, configure the **Cleanlab Remediator** component to replace the response with a fallback message.

![Cleanlab Remediator Example](/img/cleanlab_remediator_example_fallback.png)

### Evaluate RAG pipeline

As an example, create a flow based on the **Vector Store RAG** template, and then add the **Cleanlab RAG Evaluator** component to evaluate the flow's context, query, and response.
Connect the **context**, **query**, and **response** outputs from the other components in the RAG flow to the **Cleanlab RAG Evaluator** component.

![Evaluate RAG pipeline](/img/eval_rag.png)

Here is an example of the `Evaluation Summary` output from the **Cleanlab RAG Evaluator** component:

![Evaluate RAG pipeline](/img/eval_summary_rag.png)

The `Evaluation Summary` includes the query, context, response, and all evaluation results. In this example, the `Context Sufficiency` and `Response Groundedness` scores are low (a score of `0.002`) because the context doesn't contain information about the query, and the response isn't grounded in the context.